Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

216 PART 5 Looking for Relationships with Correlation and Regression

For curves, finding the best-fitting curve is a very complicated mathematical

problem. What’s nice about the straight-line regression is that it’s so simple that

you can calculate the least-squares parameters from explicit formulas. If you’re

interested (or if your professor insists that you’re interested), we present a gen-

eral outline of how those formulas are derived.

Think of a set of data containing Xi and Yi, in which i is an index that identifies

each observation in the set, as described in Chapter 2. From those data, SSQ can be

calculated like this:

SSQ

(

)²

If you’re good at first-semester calculus, you can find the values of a and b that

minimize SSQ by setting the partial derivatives of SSQ with respect to a and b

equal to 0. If you stink at calculus, trust that this leads to these two simultaneous

equations:

a N

(

)

(

)

(

)

(

)

(

)

(

)

where N is the number of observed data points.

These equations can be solved for a and b:

(

)(

)

(

)((

)

(

)((

)

(

)

(

)

( )(

)

(

)

See Chapter 2 if you don’t feel comfortable reading the mathematical notations or

expressions in this section.

Running a Straight-Line Regression

Even if it is possible, it is not a good idea to calculate regressions manually or with

a calculator. You’ll go crazy trying to evaluate all those summations and other

calculations, and you’ll almost certainly make a mistake somewhere in your

calculations.